15 research outputs found
Weighted Sampled Split Learning (WSSL): Balancing Privacy, Robustness, and Fairness in Distributed Learning Environments
This study presents Weighted Sampled Split Learning (WSSL), an innovative
framework tailored to bolster privacy, robustness, and fairness in distributed
machine learning systems. Unlike traditional approaches, WSSL disperses the
learning process among multiple clients, thereby safeguarding data
confidentiality. Central to WSSL's efficacy is its utilization of weighted
sampling. This approach ensures equitable learning by tactically selecting
influential clients based on their contributions. Our evaluation of WSSL
spanned various client configurations and employed two distinct datasets: Human
Gait Sensor and CIFAR-10. We observed three primary benefits: heightened model
accuracy, enhanced robustness, and maintained fairness across diverse client
compositions. Notably, our distributed frameworks consistently surpassed
centralized counterparts, registering accuracy peaks of 82.63% and 75.51% for
the Human Gait Sensor and CIFAR-10 datasets, respectively. These figures
contrast with the top accuracies of 81.12% and 58.60% achieved by centralized
systems. Collectively, our findings champion WSSL as a potent and scalable
successor to conventional centralized learning, marking it as a pivotal stride
forward in privacy-focused, resilient, and impartial distributed machine
learning
Trust-Based Cloud Machine Learning Model Selection For Industrial IoT and Smart City Services
With Machine Learning (ML) services now used in a number of mission-critical
human-facing domains, ensuring the integrity and trustworthiness of ML models
becomes all-important. In this work, we consider the paradigm where cloud
service providers collect big data from resource-constrained devices for
building ML-based prediction models that are then sent back to be run locally
on the intermittently-connected resource-constrained devices. Our proposed
solution comprises an intelligent polynomial-time heuristic that maximizes the
level of trust of ML models by selecting and switching between a subset of the
ML models from a superset of models in order to maximize the trustworthiness
while respecting the given reconfiguration budget/rate and reducing the cloud
communication overhead. We evaluate the performance of our proposed heuristic
using two case studies. First, we consider Industrial IoT (IIoT) services, and
as a proxy for this setting, we use the turbofan engine degradation simulation
dataset to predict the remaining useful life of an engine. Our results in this
setting show that the trust level of the selected models is 0.49% to 3.17% less
compared to the results obtained using Integer Linear Programming (ILP).
Second, we consider Smart Cities services, and as a proxy of this setting, we
use an experimental transportation dataset to predict the number of cars. Our
results show that the selected model's trust level is 0.7% to 2.53% less
compared to the results obtained using ILP. We also show that our proposed
heuristic achieves an optimal competitive ratio in a polynomial-time
approximation scheme for the problem
Parameters Optimization of Deep Learning Models using Particle Swarm Optimization
Deep learning has been successfully applied in several fields such as machine
translation, manufacturing, and pattern recognition. However, successful
application of deep learning depends upon appropriately setting its parameters
to achieve high quality results. The number of hidden layers and the number of
neurons in each layer of a deep machine learning network are two key
parameters, which have main influence on the performance of the algorithm.
Manual parameter setting and grid search approaches somewhat ease the users
tasks in setting these important parameters. Nonetheless, these two techniques
can be very time consuming. In this paper, we show that the Particle swarm
optimization (PSO) technique holds great potential to optimize parameter
settings and thus saves valuable computational resources during the tuning
process of deep learning models. Specifically, we use a dataset collected from
a Wi-Fi campus network to train deep learning models to predict the number of
occupants and their locations. Our preliminary experiments indicate that PSO
provides an efficient approach for tuning the optimal number of hidden layers
and the number of neurons in each layer of the deep learning algorithm when
compared to the grid search method. Our experiments illustrate that the
exploration process of the landscape of configurations to find the optimal
parameters is decreased by 77%-85%. In fact, the PSO yields even better
accuracy results
CCTFv1: Computational Modeling of Cyber Team Formation Strategies
Rooted in collaborative efforts, cybersecurity spans the scope of cyber
competitions and warfare. Despite extensive research into team strategy in
sports and project management, empirical study in cyber-security is minimal.
This gap motivates this paper, which presents the Collaborative Cyber Team
Formation (CCTF) Simulation Framework. Using Agent-Based Modeling, we delve
into the dynamics of team creation and output. We focus on exposing the impact
of structural dynamics on performance while controlling other variables
carefully. Our findings highlight the importance of strategic team formations,
an aspect often overlooked in corporate cybersecurity and cyber competition
teams
Trust-Based Cloud Machine Learning Model Selection for Industrial IoT and Smart City Services
With machine learning (ML) services now used in a number of mission-critical human-facing domains, ensuring the integrity and trustworthiness of ML models becomes all important. In this work, we consider the paradigm where cloud service providers collect big data from resource-constrained devices for building ML-based prediction models that are then sent back to be run locally on the intermittently connected resource-constrained devices. Our proposed solution comprises an intelligent polynomial-time heuristic that maximizes the level of trust of ML models by selecting and switching between a subset of the ML models from a superset of models in order to maximize the trustworthiness while respecting the given reconfiguration budget/rate and reducing the cloud communication overhead. We evaluate the performance of our proposed heuristic using two case studies. First, we consider Industrial IoT (IIoT) services, and as a proxy for this setting, we use the turbofan engine degradation simulation data set to predict the remaining useful life of an engine. Our results in this setting show that the trust level of the selected models is 0.49%-3.17% less compared to the results obtained using integer linear programming (ILP). Second, we consider smart cities services, and as a proxy of this setting, we use an experimental transportation data set to predict the number of cars. Our results show that the selected model's trust level is 0.7%-2.53% less compared to the results obtained using ILP. We also show that our proposed heuristic achieves an optimal competitive ratio in a polynomial-time approximation scheme for the problem
Machine Learning-Based Peripheral Artery Disease Identification Using Laboratory-Based Gait Data
Peripheral artery disease (PAD) manifests from atherosclerosis, which limits blood flow to the legs and causes changes in muscle structure and function, and in gait performance. PAD is underdiagnosed, which delays treatment and worsens clinical outcomes. To overcome this challenge, the purpose of this study is to develop machine learning (ML) models that distinguish individuals with and without PAD. This is the first step to using ML to identify those with PAD risk early. We built ML models based on previously acquired overground walking biomechanics data from patients with PAD and healthy controls. Gait signatures were characterized using ankle, knee, and hip joint angles, torques, and powers, as well as ground reaction forces (GRF). ML was able to classify those with and without PAD using Neural Networks or Random Forest algorithms with 89% accuracy (0.64 Matthew’s Correlation Coefficient) using all laboratory-based gait variables. Moreover, models using only GRF variables provided up to 87% accuracy (0.64 Matthew’s Correlation Coefficient). These results indicate that ML models can classify those with and without PAD using gait signatures with acceptable performance. Results also show that an ML gait signature model that uses GRF features delivers the most informative data for PAD classification